APP34Data Store Recovery (USERGROUP01 and USERGROUP02)

               

Purpose:

 

APP34= Ultra APP10ssaging for the Enterprise.   A product of THIRD PARTY () that tries to provide the benefit of fast multicast APP10ssage delivery along with a guarantee of APP10ssage delivery persistence.   It consists of a APP34Daemon “Listener” that runs as a service on a server (or a number of servers) that subscribes to the saAPP10  topics as all APP34APP10ssage senders and receivers.   The “listeners” are not an additional hop in APP10ssage delivery, but are instead an eaves-dropping “store” for all APP10ssages delivered.  The “store” then acts as the place where any receiver who thinks they lost a APP10ssage would go and try to retrieve/reprocess it.  In ’s impleAPP10ntation, there are primary APP34stores and backup stores which processes would auto-connect to if the primary store were to go down; They are NOT redundant stores.

 

 has impleAPP10nted the two separate APP34Data Stores, for two separate paths of data: 

 

1)      GOUP01 (USERGROUP01)

2)      GOUP02 (USERGROUP02)

 

-          By design, there are eight APP34Data Store servers between both data centers:

 

·         One DC01 server acting as the primary APP34GOUP01 Data Store, generally storing APP01, APP02, APP03 and APP04 related APP10ssages.

·         One DC01 server acting as the backup APP34GOUP01 Data Store, generally storing APP01, APP02, APP03, and APP04 related APP10ssages.

·         One DC02 server acting as the primary APP34GOUP01 Data Store, generally storing APP01, APP02, APP03 and APP04 related APP10ssages.

·         One DC02 server acting as the backup APP34GOUP01 Data Store, generally storing APP01, APP02, APP03, and APP04 related APP10ssages.

 

·         One DC01 server acting as the primary NYSE APP34GOUP02 Data Store, generally storing APP10, APP11, APP12 and APP13 related APP10ssages.

·         One DC01 server acting as the primary NASD APP34GOUP02 Data Store, generally storing APP10, APP11, APP12 and APP13 related APP10ssages.

·         One DC02 server acting as the primary NYSE GOUP02 Data Store, generally storing APP10, APP11, APP12 and APP13 related APP10ssages.

·         One DC02 server acting as the primary NASD APP34GOUP02 Data Store, generally storing APP10, APP11, APP12 and APP13 related APP10ssages.

 

Use the following hyperlinks to jump to the desired section of APP34docuAPP10ntation:

 

UAPP10_Recovery_Considerations

 

UAPP10_ONE_GOUP01_Troubleshooting_Table

UAPP10_ONE_GOUP02_Troubleshooting_Table

UAPP10_BOTH_GOUP01_Troubleshooting_Table

UAPP10_BOTH_GOUP02_Troubleshooting_Table

 


 

APP34Recovery Considerations:

NOTE: APP34Services are not configured to "move" between nodes. 

Instead, we move the "APP34registrations" of sending/receiving applications by shutting down APP34stores.

 

What happens when sending application goes down?:

When the sending application goes down, there is no impact to the “store”.   The “stores” simply don’t have new APP10ssages to listen for until the senders coAPP10 back up.  When the sender coAPP10s back up, it resuAPP10s its connection to the “store” and the “store” continues listening.

 

 

What happens when the receiving application goes down (or thinks it lost APP10ssages)?:

When the receiving application goes down, it loses its connection to the sending applications and the “store”.  When the receiving application coAPP10s back up, it resuAPP10s its connections to the store and initiates a request for any missed APP10ssages.  The “store” delivers the APP10ssages and the receiving application reprocesses the lost APP10ssages.   The saAPP10 “APP10ssage request/retransmission” would occur if the receiving application suffers a  unrecoverable loss of APP10ssages as well.

 

 

What happens when the primary APP34store goes down?:

When the primary APP34store goes down, both the sending and receiving applications recognize the connections lost and automatically re-connect to the backup APP34store.  The APP10ssages “stored” in the primary are no longer available to the receiving application, and only new APP10ssages sent from the sending applications to the backup “store” will be able to be resent (if the receiving process thinks they lost them). 

 

In the  impleAPP10ntation, the stores are not redundant; Only one store is “listening” at a tiAPP10.  And since APP34stores are not redundant, APP10ssages sent/stored by a given APP34Service BEFORE "re-registration" periods are essentially unable to be resent AFTER the "re-registration" period.  Or in other words, ONLY APP10ssages sent/stored AFTER a APP34registration is made are available to be resent to a receiver.

 

 

 

 

 


 

APP34Troubleshooting Table:

APP34Symptom

Impacts

Response

If ONLY ONE node running APP34GOUP01 Data Store Service is having problems:

 

Evidenced by:

-          In EMT, application specific APP34registration errors are seen that indicate problems with one APP34instance only involving APP01, APP10, APP03, APP02 and APP04 related APP10ssages.

 

Impacts will be specific to applications involved.

 

The following applications are SENDERS and RECEIVERS of 29west USERGROUP01 related data:

-          APP01                   to APP10

-          APP20BRIDGE    to APP10 and APP20

-          APP12                   to APP02R

-          APP03                   to APP02R

 

1)      Notify Operations ManageAPP10nt.

2)      Stop the APP34GOUP01 service running on the node (via NTM Control Utility)

3)      Confirm through EMT and ER that all APP34GOUP01 SENDING services previously registered re-register for backup APP34GOUP01 service.

4)      Confirm through EMT and ER that all APP34GOUP01 RECEIVING services previously registered re-register for backup APP34GOUP01 service

5)      Wait for all "re-registrations" to complete and give all applications soAPP10 tiAPP10 to function with all remaining stores.

6)      Wait for a fair amount of tiAPP10 to make sure all applications, including APP34GOUP01 Stores are healthy.  Arbitrarily, 2-5 minutes.

7)      Restart the original APP34GOUP01 service on the original node (APP34GOUP01 services are not set up to move between nodes)

8)      Confirm that no applications re-register for it since "re-registrations" only occur when APP34Stores are lost and not when they are brought up.

 


 

APP34Symptom

Impacts

Response

If ONLY ONE node running APP34GOUP02 Data Store Service is having problems:

 

Evidenced by:

-          In EMT, application specific APP34registration errors are seen that indicate problems with one APP34instance only involving APP10, APP11, APP12 and APP13 related APP10ssages.

 

 

Impacts will be specific to applications involved.

 

The following applications are SENDERS and RECEIVERS of 29west GOUP02 related data:

-          APP10                   to SITE 1 APP11

-          APP10                   to SITE 2 APP11

-          SITE 1 APP11      to SITE 1 APP12

-          SITE 2 APP11      to SITE 2 APP12

-          SITE 1 APP12      to SITE 1 APP13

-          SITE 2    to SITE 2 APP13

 

1)      Notify Operations ManageAPP10nt.

2)      Stop the APP34GOUP02 service running on the node (via NTM Control Utility)

3)      Confirm through EMT and ER that all APP34GOUP02 SENDING services previously registered re-register for backup APP34GOUP02 service.

4)      Confirm through EMT and ER that all APP34GOUP02 RECEIVING services previously registered re-register for backup APP34GOUP02 service

5)      Wait for all "re-registrations" to complete and give all applications soAPP10 tiAPP10 to function with all remaining stores.

6)      Wait for a fair amount of tiAPP10 to make sure all applications, including APP34GOUP02 Stores are healthy.  Arbitrarily, 2-5 minutes.

7)      Restart the original APP34GOUP02 service on the original node (APP34GOUP02 services are not set up to move between nodes)

8)      Confirm that no applications re-register for it since "re-registrations" only occur when APP34Stores are lost and not when they are brought up.

 


 

APP34Symptom

Impacts

Response

If BOTH APP34GOUP01 Data stores need to be restarted with stores cleared (as they do at “start of day”)

 

-          Evidenced by unresolvable 29west problems involving APP01, APP10, APP03, APP02 and APP04 related APP10ssages.

 

Impacts will be specific to applications involved.

 

The following applications are SENDERS and RECEIVERS of 29west USERGROUP01 related data:

-          APP01                   to APP10

-          APP20BRIDGE    to APP10

-          APP10                   to APP12, APP03, APP01, APP20B

-          RTC                        to APP02R, APP04

-          APP03                   to APP02R

 

1)      Notify Operations ManageAPP10nt.

2)      Halt Trading in all stocks.  

Refer to Application Recovery - APP10.docx

3)      Stop CMA testing via NTM Control Utility to avoid CMA testing APP10s during recoveries.

4)      Stop all applications that utilize APP34stores (via NTM Control Utility)

-          Use Production OpsAPP10nu Shutdown APP10nu and APP34developer input to confirm the list of processes involved and the following order of shutdown:

1)      Stop APP01

2)      Stop APP20 Bridge, APP10s, APP03s

3)      Stop APP12

4)      Stop CLR (RTC)

5)      Stop APP02R

6)      Stop APP04

7)      Stop both APP34services

8)      RenaAPP10 APP34GOUP01 Store files, from Production OpsAPP10nu Startup APP10nu.

9)      Purge APP34GOUP01 cache and state files, from Production OpsAPP10nu Startup APP10nu.

10)   Restart both APP34GOUP01 services and check APP34GOUP01 Store log files to confirm startup is “clean”.

11)   Restart all applications stopped in the order that they appear in Production OpsAPP10nu Startup APP10nu.

12)   Confirm through EMT and ER that all startups occur with error.

13)   Perform  System Integrity Checklist.docx with focus on paths utilized by 29west.

 


 

 

APP34Symptom

Impacts

Response

If BOTH APP34GOUP02 Data stores need to be restarted with stores cleared (as they do at “start of day”)

 

Evidenced by unresolvable 29west problems involving APP10, APP11, APP12 and APP13 related APP10ssages.

Impacts will be specific to applications involved.

 

The following applications are SENDERS and RECEIVERS of 29west GOUP02 related data:

-          APP10                   to SITE 1 APP11

-          APP10                   to SITE 2 APP11

-          SITE 1 APP11      to SITE 1 APP12

-          SITE 2 APP11      to SITE 2 APP12

-          SITE 1 APP12      to SITE 1 APP13

-          SITE 2    to SITE 2 APP13

 

1)      Notify Operations ManageAPP10nt.

2)      Halt Trading in all stocks.  

Refer to Application Recovery - APP10.docx

3)      Stop CMA testing via NTM Control Utility to avoid CMA testing APP10s during recoveries.

4)      Stop all applications that utilize APP34stores (via NTM Control Utility)

-          Use Production OpsAPP10nu Shutdown APP10nu and APP34developer input to confirm the list of processes involved and the following order of shutdown:

5)      Stop APP10

6)      Stop APP11

7)      Stop APP12

8)      Stop APP13

9)      Stop both APP34services

10)   RenaAPP10 APP34GOUP02 Store files, from Production OpsAPP10nu Startup APP10nu.

11)   Purge APP34GOUP02 cache and state files, from Production OpsAPP10nu Startup APP10nu.

12)   Restart both APP34GOUP02 services and check APP34GOUP02 Store log files to confirm startup is “clean”.

13)   Restart all applications stopped in the order that they appear in Production OpsAPP10nu Startup APP10nu.

14)   Confirm through EMT and ER that all startups occur with error.

15)   Perform  System Integrity Checklist.docx with focus on paths utilized by 29west.